Accurate emotion recognition using Bayesian model based EEG sources as dynamic graph convolutional neural network nodes

Due to the effect of emotions on interactions, interpretations, and decisions, automatic detection and analysis of human emotions based on EEG signals has an important role in the treatment of psychiatric diseases. However, the low spatial resolution of EEG recorders poses a challenge. In order to overcome this problem, in this paper we model each emotion by mapping from scalp sensors to brain sources using Bernoulli–Laplace-based Bayesian model. The standard low-resolution electromagnetic tomography (sLORETA) method is used to initialize the source signals in this algorithm. Finally, a dynamic graph convolutional neural network (DGCNN) is used to classify emotional EEG in which the sources of the proposed localization model are considered as the underlying graph nodes. In the proposed method, the relationships between the EEG source signals are encoded in the DGCNN adjacency matrix. Experiments on our EEG dataset recorded at the Brain-Computer Interface Research Laboratory, University of Tabriz as well as publicly available SEED and DEAP datasets show that brain source modeling by the proposed algorithm significantly improves the accuracy of emotion recognition, such that it achieve a classification accuracy of 99.25% during the classification of the two classes of positive and negative emotions. These results represent an absolute 1–2% improvement in terms of classification accuracy over subject-dependent and subject-independent scenarios over the existing approaches.

www.nature.com/scientificreports/ the localization algorithm used to weight the adjacency matrix of this graph. The results are used in the DGCNN algorithm to classify emotions. The potentials recorded in the electrodes actually represent the superposition of these brain source activities. As a result, it is clear that the information obtained from the localization algorithms is more accurate and efficient than the raw EEG signal information.
In this study, features obtained from extracted Bernoulli-Laplace-based Bayesian model sources are considered as the signal of dynamical graph convolutional neural networks (DGCNN) nodes. By encoding the inter-source relations of EEG source signals in the adjacency matrix, the pattern of activity in different brain areas is used to increase the accuracy of emotion classification. This algorithm allows the classification of unseen emotional EEG signals into negative and positive emotional classes.
The main sections of this study are summarized as follows: In Section "Mathematical background", Mathematical background of EEG Source localization and dynamical graph convolutional neural networks (DGCNN) have been introduced. Then, the proposed approach for emotional states classification has been provided in Section "Emotional EEG source recognition based on DGCNN". In the "Simulation results" section, the results of the proposed method are explained. Finally, the results will be discussed.

Mathematical background
In this section, the basic theory of EEG source localization and dynamical graph convolutional neural networks will be presented. EEG source localization. EEG source localization method provides spatio-temporal information about the activity of different areas of the brain. Brain source localization improves the non-invasive detection of functional, mental, and even physiological abnormalities related to the brain in clinical applications 27 . In these methods, the sources are considered as several discrete magnetic dipoles in the three-dimensional space of the brain. One of the most common methods in this field is the LORETA method. The basic hypothesis in this method is that the current density of brain source at any point in the cortex is close to the average current density of its neighbors. A major problem in this method is the low spatial resolution and the blurring and scattering noise of the point sources of the images 28 . In order to solve this problem, using the current density standardization hypothesis, the sLORETA method has been proposed as a generalization of the LORETA method 29 . Since the electric potential at any point on the scalp is a linear combination of the dipole amplitudes of the brain sources, therefore, the relationship between the potential in the scalp and the dipole amplitudes of the sources is defined as follows 30,31 : where, y ∈ R N is the EEG data of N electrodes and the amplitudes of M dipoles in the 3D spatial space is shown by x ∈ R M . The N × M lead field matrix H models the propagation of the electromagnetic field from the sources to the sensors and the noise of recorded EEG data is considered as an additive white Gaussian noise e 32,33 .
As mentioned above, the inverse problem is an under-determined problem due to the limited placement of electroencephalogram sensors and a large number of brain sources. This imposes more constraints on achieving a unique solution. Proper regulation is usually required to solve an ill-posed inverse problem. Solutions that are considered the usual l 2 norm have low computational complexity. However, in several cases, it is believed that the actual activity of the brain is concentrated in several focal areas. In such situations, the l 2 norm creates overestimating problem of active space areas. To solve this problem, the promotion of sparse solutions is proposed, for example, based on l 1 norm that can be easily controlled by optimization techniques. In 34 , it is considered to use a l 0 + l 1 norm to apply sparse source activity (ensuring that a small number of non-zero elements are present in the solution) while regulating the non-zero amplitudes of the solution. More precisely, the norm limits the amplitude values of non-zero elements while the pseudonorm controls their position. Using Bernoulli-Laplace prior, the hybrid l 0 + l 1 norm is introduced in the Bayesian framework. The proposed Bayesian model uses the Markov chain Monte Carlo sampling technique to estimate the model hyperparameters. It has been proven that this model is in favor of sparsity. It is very common to consider an additive white Gaussian noise with variance σ 2 n in EEG analysis 30 . θ = x, σ 2 n is unknown parameter vectors related to the proposed model (1). Priors of these parameters for Bayesian inference are given as follow: (1) Dipole Amplitudes Prior: A l 0 + l 1 regularization using Bernoulli-Laplace prior distribution for each x vector element is introduced similar to Bayesian to encourage sparse solutions whose non-zero elements have small amplitudes. The corresponding pdf for the ith element of x is where the parameter of the Laplace distribution is , the Dirac delta function is δ(.) . ω as a weight balances the effects of the Laplace distribution and the Dirac delta function. (2) The Noise Variance Prior: A noninformative Jeffrey's prior is considered for the noise variance: www.nature.com/scientificreports/ where 1 R + (ξ ) = 1 if ξ ∈ R + and 0 otherwise. This is a very common choice for a noninformative prior 35 .
Attend that a more informative prior distribution of signal-to-noise ratio can also be considered.
The hyperparameter vector of the previous priors is = {ω, }.
The joint posterior distribution of the model can be represented by considering the previously introduced likelihood and priors using the following hierarchical construction: where the model parameters and hyperparameters vector is {θ, } . The Bayesian estimators of {θ, } cannot be calculated with simple closed-form declarations, because this posterior distribution has complexities. In order to sample the joint posterior distribution, a Markov chain monte carlo (MCMC) method can be used (4). This method uses the generated samples to build Bayesian estimators of the unknown model parameters. For this purpose, a Gibbs sampler 35 is considered, which generates samples repeatedly from conditional distributions (4), i.e., from f σ 2 n |y, x , f ( |x), f (ω|x) and f x i |y, x −i , ω, , σ 2 n . The likelihood and the prior distribution of x are used to calculate the conditional distribution of each signal element x i . This distribution can be defined as follows: where the truncated Gaussian distributions on R + and R − are shown using N + and N − , respectively. The vector x can be decomposed on the orthonormal basis B = {n 1 , … ,n M } such that x =x −i + x i n i where x −i is obtained by setting the ith element of x to 0. Defining ν i = y − Hx −i and h i = Hn i , the weights are defined as where and Dynamical graph convolutional neural network. Network data can be easily modeled as a graph signal. In this situation, the fundamental network topology is demonstrated using a graph. Data values are consecrated to the graph nodes. An undirected graph G = (V, D, W) with node set V = {1, ..., M} , edge set D ⊆ V × V and W ∈ R M×M define an weighted adjacency matrix that explains the connections between any two nodes in V . w ij is the entry of W in the i-th row and j-th column. The set of nodes that share an edge with node i is called the neighborhood of node i ∈ V , which are defined as C i = j ∈ V : (j, i) ∈ D .
A common signal processing method for graph data operation is graph convolution or spectral graph filtering, in which graph Fourier transform (GFT) 36 is typically used. The Laplacian matrix of the graph G is defined using L. L can be represented as follow: where ith diagonal element of S ∈ R M×M diagonal matrix can be computed by S ii = j w ij . The GFT of a given signal x ∈ R M is represented as: where the transformed signal in the frequency domain is defined by x . The singular value decomposition (SVD) of the graph Laplacian matrix L is an orthonormal matrix U as follow 37 : By considering (10), the inverse GFT can be declared as follows: For the two signals x and z , the convolution on the graph * G can be defined as follows 38 : where ⊙ shows Hadamard's product in terms of element. www.nature.com/scientificreports/ The optimal adjacency matrix W * can be learned. The spatial filtering g(L * ) defines the graph convolution of x signal with the vector U * g( * ) , which can be demonstrated as follows: where g( ) is demonstrated as where the L * can be computed based on (9) using W * , and * = diag([ * 0 , · · · , * N−1 ]) is a diagonal matrix, whereas direct calculation of g( * ) expression is difficult, we use, e.g. the Kψ order Chebyshev polynomials to fastly calculate the polynomial expansion of g( * ) as follow 38 : where the following recursive expressions can be used to T k (x) recursively calculation. θ k is the coefficient of Chebyshev polynomials. Therefore, (16) is used to rewrite the convolution graph operation of (14) as follow: where L * = 2L * * max − I M . The backpropagation (BP) method is used to iteratively optimize the optimal network parameters, in which the network parameters update until the optimal or suboptimal solutions are attained. Thus, a loss function is expressed based on cross-entropy cost. In order to dynamically learn the optimal adjacency matrix W * of the DGCNN model in the BP method, we must calculate the partial derivative of the loss function relative to W * . After that, the updating formula of the optimal adjacency matrix W * can be expressed as: where the learning rate of the network is shown by ψψ.

Emotional EEG source recognition based on DGCNN
In this section, how to extract signals from the brain sources of emotions, the use of the DGCNN algorithm to classify the types of emotional states, and detailed information about the data used in this study are described in detail.

Proposed classification algorithm using DGCNN and Bayesian model based emotional EEG source.
Considering the challenges of feature selection and extraction in previous methods and the need to increase the accuracy of classifying both positive and negative emotions, this section presents a method based on EEG source localization and graph theory. To this end, Fig. 2 shows a block-diagram of the proposed method for classifying two emotional classes: • Emotional EEG source localization using Bernoulli-Laplace-based Bayesian model: The brain sources that generate the EEG signal are calculated using Bernoulli-Laplace-based Bayesian model algorithm. This algorithm is initialized using the sLORETA method. • Graph generation: In the proposed method, a graph signal on the top of each graph node is obtained based on each extracted source signal. The graph adjacency matrix is also weighted based on the calculated correlation between the extracted source signals. • Graph pattern classification using the DGCNN algorithm: The weighted graph adjacency matrix, the graph corresponding to the extracted source signals, is given as input to the DGCNN algorithm for recognizing and classifying emotions.
In this study, active areas of the brain during two kinds of emotional stimuli are identified using the proposed Bayesian model based on Bernoulli-Laplace prior. The sLORETA method is applied to initialize the source signals in this algorithm. To calculate the results of the sLORETA algorithm, we use the Colin27 brain atlas model from the Montreal Neurological Institute (MNI) and the OpenMEEG BEM head model 39, 40 . The localization solution space is surrounded by the gray matter of the cortex. A resolution of 5 mm (mm/voxel) with 5614 voxels at MNI coordinates is used for this space in localization. If the number of vertices in the space of the localization www.nature.com/scientificreports/ solution increases, the recognition accuracy of the active areas during emotion induction increases. The differences between the active brain sources for recorded dataset in the Bradmann (BA) of cerebral cortex 41 for sLORETA and Bayesian model based on Bernoulli-Laplace prior methods are shown in Fig. 3. The lateral view of the active brain areas for subject 1 during positive and negative emotional stimulation using the sLORETA method is shown in Fig. 3a and c, respectively. In addition, the lateral view of the active brain areas for subject 1 during positive and negative emotional stimulation using the Bayesian model based on Bernoulli-Laplace prior is presented in Fig. 3b and d, respectively.
In the results of sLORETA topographic images, the areas including the auditory cortex, lingual gyrus, and amygdala located in the lower and middle temporal cortex and the middle occipital cortex show the most activity during emotional stimulation of the brain. Considering the results of the sLORETA method, 26 Broadman regions are considered as the region of interest (ROI) for feature extraction. BA5, 6,7,9,10,11,18,19,21,22,29,37,38,39, and 40 with bilateral hemispheres are selected as ROI areas (Fig. 4).
However, the Bayesian model based on Bernoulli-Laplace prior method concentrates the active areas and thus reduces the number of these areas. Unlike previous methods, this method simplifies the complex pattern of most active brain areas. Differences in the brain areas that are activated during positive and negative stimuli indicate that a spatial-information-aware classifier can be used to accurately classify emotions. In the proposed method, most activities are seen in BA 19, 37  www.nature.com/scientificreports/ volume of the proposed method and identify a set of the powerful dipoles and their corresponding neighbors, we calculate the energy of all source signals and then choose all signal sources whose power is greater than 50% of the maximum power of the activity amplitude. The signals from each source are used as input to the classification algorithm. Sources with less than 50% of the maximum power are discarded to reduce computational complexity. From what has been said, it is clear that the formation of source signals graph can provide a pattern of different areas activity during an emotional stimulus to classify emotions. In this case, there will be a graph of brain sources for each emotional stimulus. For this purpose, an adjacency matrix that describes the relationships between nodes will be needed. In this matrix, if there is an edge from node i to node j, A ij , A ji = 1 , otherwise A ij , A ji = 0 (Fig. 5).
The location of the vertices in the MNI coordinates is considered as graph nodes and the corresponding source signal is considered as the graph signal at the top of that node. In the proposed approach, the correlation between the two corresponding nodes signals of one edge is considered as the initial value of the graph edge weight. More precisely, the correlation between the i-th source x i and the j-th source x j can be computed as follows: where i, j ∈ {1, ..., M},t ∈ {1, ..., T}.
Here, we define a threshold β , such that when w ij > β , the i-th source is linked with the j-th source in the constructed graph G . In this paper, a model based on graph-structured data is proposed to learn and classify the patterns of EEG source. In DGCNN, the adjacency matrix is updated with graph model parameters changes during model training to learn the relationships between EEG source signals according to (19), unlike the traditional graph convolutional neural network (GCNN) method, in which the adjacency matrix was determined before model training. This approach improves the classification results. In the proposed algorithm, the network parameters are frequently updated to achieve optimal or semi-optimal solutions according to (16). The structure of the proposed algorithm is indicated in Fig. 6, which includes the graph filtering layer, convolutional layers, and one fully connected layer. The detailed procedures of the proposed algorithm are summarized in Algorithm 1.    www.nature.com/scientificreports/ The 32-channel EEG data sampling were reduced to 128 Hz and the EOG was removed by filtering 4.0-45.0 Hz from the data. And then, a 5 s hamming window with non-overlap was used to divide each signal into 12 data segments.
Recorded EEG. In the database of the Brain-Computer Interface Research Laboratory, University of Tabriz, Iran, the EEG signals of 16 people without a history of mental illness (6 women and 10 men between 28 and 21 years old) were recorded while listening to emotional music 42, 43 . The 21-channel Encephalan Medicom device was used to record the EEG signal. The sampling rate in this experiment is about 250 Hz. The international standard system 10-20 is utilized to arrange the electrodes on the head (Fig. 7). In the questionnaire version, the Self-Assessment Manikin (SAM) 44 and in the test process, a 9-point test was used to assess positive and negative emotions. In addition, the participants 45 completed the Beck Depression Inventory (BDI) questionnaire. The SAM results and description of the BDI test are presented in Table 1. Details of the selected music for each theme are demonstrated in Table 2. The sequence of how to play musical stimuli for participants is shown in Fig. 8. A fifteen-second silence is applied between the two pieces of music. An intermediate filter with cut-off frequencies of 0.5 and 70 Hz is used to extract useful EEG signal information. According to Fig. 8, the number of data related to the neutral class is less than the data of the positive and negative classes, which causes an imbalance between the data and may cause the problem of over-fitting. In addition, an imbalance between the data of each class leads to bias in the classification results and a decrease in accuracy. To solve this problem, using overlapping methods, all the corresponding epochs of each emotion are connected to form a long signal. Rectangular windows are then executed with a specific duration and overlap so that the number of epochs collected is equal to each of the emotion classes. In the proposed method for each channel, 5 min of recorded signal (as shown in Fig. 3) is selected for each emotion. In this case we have 2 data classes (negative and positive) with 75,000 sample points for each channel. The data is then split into 8-s intervals per channel, using the overlap technique to prevent over-fitting.

Simulation results
The Brainstorm toolbox in MATLAB R2019a was used to calculate the active brain regions by sLORETA method. The results of this method are used as the initial value for the Bayesian model based on Bernoulli-Laplace prior. A server with an NVIDIA 1080TI GPU and an Intel Core i7 CPU is intended to implement the proposed algorithm in Tensorflow 2.0.0 in Python programming language. The results of the proposed algorithm for automatic detection of emotions are presented in the continuation of this section. In this study, unlike many studies, the evaluation results of the proposed method for inducing emotion with both music and image are presented, so to fairly compare the proposed and the existing state-of-the-art methods; we implement both categories of approaches on our recorded data, SEED and DEAP datasets. The sources with less than 50% of a subject's maximum power are eliminated to reduce the computational cost of algorithm. The proposed method is evaluated in two subject-dependent and subject-independent scenarios. In the subject-dependent scenario, 4 out of 10 trails are randomly considered as a training set and the remaining 6 experiments are considered as a testing set. In addition, in a subject-independent scenario, the data of 40% subjects are used for training and 40% subjects for testing and 20% subjects for validation of proposed method. Finally, the average accuracy performance of the proposed method is reported to all subjects.
The accuracy of the subject-dependent scenario of the proposed method and the existing methods 11,12,17,21,23,25,26 are compared in Fig. 9. The lowest accuracy in this comparison is related to the method in 11 with 67.7%. However, www.nature.com/scientificreports/ for the method 26 , the average accuracy is 96.87%. It can be seen that in all subjects, the highest accuracy is related to the proposed method with 98.95%. The proposed method and the methods 11,12,17,21,23,25,26 in Fig. 10 were compared in the form of subject-independent scenario. As can be seen, the best accuracy of the subjectindependent scenario is related to the method in 26 with 95.83%. However, our proposed algorithm in this scenario gives 97.91% accuracy.
These results indicate the robustness of the proposed algorithm against cross-subject variations. According to the results, the accuracy of subject-independent scenario is less than the accuracy of subject-dependent scenario, the reason for this issue is the use of unseen data to test the algorithm in a subject-independent scenario. It is clear from the results that the accuracy of the proposed algorithm in both subject-dependent and subject-independent Table 1. Validation of individuals participating in the EEG signal recording process in order to identify positive and negative emotions 43 .  www.nature.com/scientificreports/ scenarios is better than the methods available in 11,12,17,21,23,25,26 . The evaluation results of the proposed method and the existing methods are presented for the SEED and DEAP dataset in Tables 3 and 4, respectively. The accuracy of the subject-independent scenario is 98.51% and 98.32% and the subject-dependent scenario is 99.25% and 98.96% for our proposed method on the SEED and DEAP dataset, respectively. The highest accuracy for subjectdependent scenario and the subject-independent scenario for the proposed method 26 among the available methods have been calculated as 98.51% and 97.77%, respectively. The accuracy obtained for our proposed method   Figure 9. Subject-dependent scenario accuracy of the proposed method as well as the methods presented in methods 11,12,17,21,23,25,26 for recorderd EEG.   Figure 10. Subject-independent scenario accuracy of the proposed method as well as the methods presented in 11,12,17,21,23,25,26 for recorderd EEG. www.nature.com/scientificreports/ www.nature.com/scientificreports/ is greater than other methods. As shown in Table 5, when Bernoulli-Laplace-based Bayesian model is used for source localization, the accuracy of the proposed algorithm is higher than when sLORETA is used. According to Table 5, if CNN classifier is used instead of DGCNN, the accuracy of the proposed algorithm be lower.

Discussion and conclusion
In this study, we propose an algorithm based on DGCNN and EEG sources to recognize emotions. A mapping of scalp sensors to brain sources is performed to extract the pattern of each emotion using Bayesian model based on Bernoulli-Laplace prior. The results of sLORETA method is used for initialization of this model. In the proposed method, a DGCNN is used to classify emotion-based EEG in which the sources of the Bayesian model based on Bayesian model based on Bernoulli-Laplace prior method are considered as underlined graph signals. Finally, emotional EEG signals are divided into negative and positive emotional classes using this approach. The proposed method is compared with existing standard methods in subject-independent and subject-dependent experiments on our emotional EEG dataset, DEAP and the SEED dataset. Feature extraction from EEG data in all previous methods is a major challenge. In this study, to solve this problem, the spatio-temporal information of emotional EEG sources is encoded in a graph. The DGCNN algorithm is then used to classify these graphs. Using purposed approach, acceptable accuracy for the data is obtained without the need to design the feature extraction process. According to the results, the proposed technique has made the brain areas involved in emotions processing more focused. Significant differences can be seen in the areas involved during the induction of positive and negative emotions. This issue significantly increases the accuracy of the emotion classification. Another point in the proposed method is the updating of the adjacency matrix in DGCNN algorithm, which in itself improves the emotion classification accuracy.
Increasing the number of electrodes used to record the signal based on the results of previous studies in the field of EEG signal processing 46 , improves classification accuracy. However, the problem is that it is costly and time-consuming to use high-density EEG sensor arrays in a clinical or field environment. In this study, we use the source localization technique to increase spatial information in EEG recordings. The spatial resolution of EEG recordings can be expanded by increasing the number of sources. These sources contain good spatio-temporal information. According to the concepts mentioned in the results section, the accuracy of the subject-dependent scenario and the subject-independent scenario for our proposed method are 99.25% and 98.51%, respectively. These accuracies are greater than the values obtained in existing state-of-the-art methods.
The use of video or music video to induce emotions, in addition to the areas related to emotion processing, also involves the visual and memory areas 11,12,17,21,23,25,26 . Considering the results of emotion induction using music in this study and this issue, it is clear that auditory induction can be an easier and more appropriate way to induce emotions. In this study, the weight of each graph edge is determined by calculating the correlation between the graph signal sources. In future studies, another feature of graph signals can be used as a criterion to calculate the weight of edges.